<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.7">
 <TITLE>SDCC Compiler User Guide: Optimizations</TITLE>
 <LINK HREF="SDCCUdoc-7.html" REL=next>
 <LINK HREF="SDCCUdoc-5.html" REL=previous>
 <LINK HREF="SDCCUdoc.html#toc6" REL=contents>
</HEAD>
<BODY>
<A HREF="SDCCUdoc-7.html">Next</A>
<A HREF="SDCCUdoc-5.html">Previous</A>
<A HREF="SDCCUdoc.html#toc6">Contents</A>
<HR>
<H2><A NAME="Optimizations"></A> <A NAME="s6">6. Optimizations</A> </H2>

<P>SDCC performs a a host of standard optimizations in addition to some MCU
specific optimizations. 
<H2><A NAME="Sub-expression Elimination"></A> <A NAME="ss6.1">6.1 Sub-expression elimination</A>
 </H2>

<P>The compiler does local and global common subexpression elimination.
<P><CODE>eg. </CODE>
<P>
<PRE>
i = x + y + 1; 
j = x + y;
 
</PRE>
<P>will be translated to
<P>
<PRE>
iTemp = x + y 
i = iTemp + 1 
j = iTemp
 
</PRE>
<P>Some subexpressions are not as obvious as the above example.
<P>eg.
<P>
<PRE>
a-&gt;b[i].c = 10; 
a-&gt;b[i].d = 11;
 
</PRE>
<P>In this case the address arithmetic a-&gt;b[i] will be computed
only once; the equivalent code in C would be.
<P>
<PRE>
iTemp = a-&gt;b[i]; 
iTemp.c = 10; 
iTemp.d = 11;
 
</PRE>
<P>The compiler will try to keep these temporary variables in registers.
<H2><A NAME="Dead-code elimination"></A> <A NAME="ss6.2">6.2 Dead-Code elimination.</A>
 </H2>

<P>eg.
<P>
<PRE>
int global; 
void f () { 
  int i; 
  i = 1;    /* dead store */ 
  global
 = 1; /* dead store */ 
  global = 2; 
  return; 
  global = 3; /* unreachable
 */ 
}
 
</PRE>
<P>will be changed to
<P>
<PRE>
int global; void f () 
{     
 global = 2;     
 return; 
}
 
</PRE>
<H2><A NAME="Copy-Propagation"></A> <A NAME="ss6.3">6.3 Copy-Propagation:</A>
 </H2>

<P>eg.
<P>
<PRE>
int f() { 
   int i, j; 
   i = 10; 
   j = i; 
   return j; 
}
 
</PRE>
<P>will be changed to 
<P>
<PRE>
int f() { 
    int i,j; 
    i = 10; 
    j = 10; 
    return 10;
 
}
 
</PRE>
<P>Note: the dead stores created by this copy propagation will be eliminated
by dead-code elimination .
<H2><A NAME="Loop Optimizations"></A> <A NAME="ss6.4">6.4 Loop optimizations</A>
 </H2>

<P>Two types of loop optimizations are done by SDCC loop invariant lifting
and strength reduction of loop induction variables.In addition to the strength
reduction the optimizer marks the induction variables and the register allocator
tries to keep the induction variables in registers for the duration of the
loop. Because of this preference of the register allocator , loop induction
optimization causes an increase in register pressure, which may cause unwanted
spilling of other temporary variables into the stack / data space . The compiler
will generate a warning message when it is forced to allocate extra space either
on the stack or data space. If this extra space allocation is undesirable then
induction optimization can be eliminated either for the entire source file
( with --noinduction option) or for a given function only (#pragma NOINDUCTION).
<H3><A NAME="Loop Invariant"></A> Loop Invariant: </H3>

<P>eg
<P>
<PRE>
for (i = 0 ; i &lt; 100 ; i ++) 
     f += k + l;
 
</PRE>
<P>changed to
<P>
<PRE>
itemp = k + l; 
for ( i = 0; i &lt; 100; i++ ) f += itemp;
 
</PRE>
<P>As mentioned previously some loop invariants are not as apparent, all static
address computations are also moved out of the loop.
<H3><A NAME="Strength Reduction"></A> Strength reduction : </H3>

<P>This optimization substitutes an expression by a cheaper expression.
<P>eg.
<P>
<PRE>
for (i=0;i &lt; 100; i++) ar[i*5] = i*3;
 
</PRE>
<P>changed to
<P>
<PRE>
itemp1 = 0; 
itemp2 = 0; 
for (i=0;i&lt; 100;i++) { 
     ar[itemp1]
 = itemp2; 
     itemp1 += 5; 
     itemp2 += 3; 
}
 
</PRE>
<P>The more expensive multiplication is changed to a less expensive addition.
<H3><A NAME="Loop reversing"></A> Loop reversing: </H3>

<P>This optimization is done to reduce the overhead of checking loop boundaries
for every iteration. Some simple loops can be reversed and implemented using
a "decrement and jump if not zero" instruction. SDCC checks for the following
criterion to determine if a loop is reversible (note: more sophisticated compiers
use data-dependency analysis to make this determination, SDCC uses a more simple
minded analysis).
<P>
<UL>
<LI>The 'for' loop is of the form 
"for ( &lt;symbol&gt; = &lt;expression&gt;
; &lt;sym&gt; [&lt; | &lt;=] &lt;expression&gt; ; [&lt;sym&gt;++
| &lt;sym&gt; += 1])
&lt;for body&gt;"</LI>
<LI>The &lt;for body&gt; does not contain "continue" or 'break".</LI>
<LI>All goto's are contained within the loop.</LI>
<LI>No function calls within the loop.</LI>
<LI>The loop control variable &lt;sym&gt; is not assigned any value within
the loop</LI>
<LI>The loop control variable does NOT participate in any arithmetic operation
within the loop.</LI>
<LI>There are NO switch statements in the loop.</LI>
</UL>
<P>Note djnz instruction can be used for 8-bit values ONLY, therefore it is
advantageous to declare loop control symbols as either 'char' or 'short', ofcourse
this may not be possible on all situations.
<H2><A NAME="Algebraic Simplifications"></A> <A NAME="ss6.5">6.5 Algebraic simplifications:</A>
 </H2>

<P>SDCC does numerous algebraic simplifications, the following is a small
sub-set of these optimizations.
<P>
<PRE>
eg 
i = j + 0 ; /* changed to */ i = j; 
i /= 2; /* changed to */ i &gt;&gt;=
 1; 
i = j - j ; /* changed to */ i = 0; 
i = j / 1 ; /* changed to */ i = j;
 
</PRE>
<P>Note the subexpressions given above are generally introduced by macro expansions
or as a result of copy/constant propagation.
<H2><A NAME="Switch Statement"></A> <A NAME="ss6.6">6.6 'switch' statements.</A>
 </H2>

<P>SDCC changes switch statements to jump tables when the following conditions
are true. 
<P>
<UL>
<LI>The case labels are in numerical sequence , the labels need not be in order,
and the starting number need not be one or zero.</LI>
</UL>
<P>eg 
<P>
<PRE>
switch(i) {                         switch (i) { 
case 4:...
                          case 1: ... 
case 5:...                          case
 2: ... 
case 3:...                          case 3: ... 
case 6:...        
                  case 4: ... 
}                                   }
 
</PRE>
<P>Both the above switch statements will be implemented using a jump-table.
<P>
<UL>
<LI>The number of case labels is at least three, since it takes two conditional
statements to handle the boundary conditions.</LI>
<LI>The number of case labels is less than 84, since each label takes 3 bytes
and a jump-table can be utmost 256 bytes long. </LI>
</UL>
<P>Switch statements which have gaps in the numeric sequence or those that
have more that 84 case labels can be split into more than one switch statement
for efficient code generation.
<P>eg
<P>
<PRE>
switch (i) { 
case 1: ... 
case 2: ... 
case 3: ... 
case 4: ... 
case
 9: ... 
case 10: ... 
case 11: ... 
case 12: ... 
}
 
</PRE>
<P>If the above switch statement is broken down into two switch statements
<P>
<PRE>
switch (i) { 
case 1: ... 
case 2: ... 
case 3: ... 
case 4: ... 
}switch (i) { 
case 9: ... 
case 10: ... 
case 11: ... 
case 12:...
 
}
 
</PRE>
<P>then both the switch statements will be implemented using jump-tables whereas
the unmodified switch statement will not be .
<H2><A NAME="bit shifting"></A> <A NAME="ss6.7">6.7 bit-shifting operations.</A>
 </H2>

<P>Bit shifting is one of the most frequently used operation in embedded programming
. SDCC tries to implement bit-shift operations in the most efficient way possible.
<P>eg.
<P>
<PRE>
unsigned short i;... 
i&gt;&gt;= 4; 
..
 
</PRE>
<P>generates the following code.
<P>
<PRE>
mov a,_i 
swap a 
anl a,#0x0f 
mov _i,a
 
</PRE>
<P>In general SDCC will never setup a loop if the shift count is known. Another
example
<P>
<PRE>
unsigned int i; 
... 
i &gt;&gt;= 9; 
...
 
</PRE>
<P>will generate
<P>
<PRE>
mov a,(_i + 1) 
mov (_i + 1),#0x00 
clr c 
rrc a 
mov _i,a
 
</PRE>
<P>Note that SDCC stores numbers in little-endian format (i.e. lowest order
first)
<H3><A NAME="bit rotation"></A> Bit-rotation: </H3>

<P>A special case of the bit-shift operation is bit rotation, SDCC recognizes
the following expression to be a left bit-rotation.
<P>
<PRE>
unsigned char i; 
... 
i = ( ( i &lt;&lt; 1) | ( i &gt;&gt; 7)); 
...
 
</PRE>
<P>will generate the following code.
<P>
<PRE>
mov a,_i 
rl a 
mov _i,a
 
</PRE>
<P>SDCC uses pattern matching on the parse tree to determine this operation
.Variations of this case will also be recognized as bit-rotation i.e i = ((i
&gt;&gt; 7) | (i &lt;&lt; 1)); /* left-bit rotation */
<H2><A NAME="Highest Order Bit"></A> <A NAME="ss6.8">6.8 Highest Order Bit.</A>
 </H2>

<P>It is frequently required to obtain the highest order bit of an integral
type (int,long,short or char types). SDCC recognizes the following expression
to yield the highest order bit and generates optimized code for it.
<P>
<PRE>
eg 
unsigned int gint; 
foo () { 
unsigned char hob; 
   ... 
   hob
 = (gint &gt;&gt; 15) &amp; 1; 
   .. 
}
 
</PRE>
<P>Will generate the following code.
<P>
<PRE>
                             61 ;  hob.c 7 
   000A E5*01               
 62         mov  a,(_gint + 1) 
   000C 33                   63         rlc 
 a 
   000D E4                   64         clr  a 
   000E 13                  
 65         rrc  a 
   000F F5*02                66         mov  _foo_hob_1_1,a
 
</PRE>
<P>Variations of this case however will NOT be recognized . It is a standard
C expression , so I heartily recommend this be the only way to get the highest
order bit, (it is portable). Of course it will be recognized even if it is
embedded in other expressions.
<P>
<PRE>
eg.xyz = gint + ((gint &gt;&gt; 15) &amp; 1);
 
</PRE>
<P>will still be recognized.
<H2><A NAME="Peep-Hole"></A> <A NAME="ss6.9">6.9 Peep-hole optimizer.</A>
 </H2>

<P>The compiler uses a rule based , pattern matching and re-writing mechanism
for peep-hole optimization . It is inspired by 'copt' a peep-hole optimizer
by Christopher W. Fraser (cwfraser@microsoft.com). A default set of rules are
compiled into the compiler, additional rules may be added with the --peep-file
&lt;filename&gt; option. The rule language is best illustrated with examples.
<P>
<PRE>
replace { 
mov %1,a 
mov a,%1 } by { mov %1,a
 }
 
</PRE>
<P>The above rule will the following assembly sequence
<P>
<PRE>
mov r1,a 
mov a,r1
 
</PRE>
<P>to
<P>
<PRE>
mov r1,a
 
</PRE>
<P>Note: All occurrences of a '%n' ( pattern variable ) must denote
the same string. With the above rule, the assembly sequence
<P>
<PRE>
mov r1,a 
mov a,r2
 
</PRE>
<P>will remain unmodified. Other special case optimizations may be added by
the user (via --peep-file option), eg. some variants of the 8051 MCU allow
only 'AJMP' and 'ACALL' , the following two rules will change all 'LJMP' &amp;
'LCALL' to 'AJMP' &amp; 'ACALL'.
<P>
<PRE>
replace { lcall %1 } by { acall %1 }
 
replace { ljmp %1 } by { ajmp %1 }
 
</PRE>
<P>The inline-assembler' code is also passed through the peep hole optimizer,
thus the peephole optimizer can also be used as an assembly level macro expander.
The rules themselves are MCU dependent whereas the rule language infra-structure
is MCU independent. Peephole optimization rules for other MCU can be easily
programmed using the rule language.
<P>The syntax for a rule is as follows ,
<P>
<PRE>
rule := replace [ restart ] '{' &lt;assembly sequence&gt;
 '\n' 
                            '}' by '{' '\n' 
   
                             &lt;assembly sequence&gt; '\n' 
         
                   '}' [if &lt;functionName&gt; ] '\n' 
&lt;assembly
 sequence&gt; := assembly instruction (each instruction including labels must
 be on a separate line).   
 
</PRE>
<P>The optimizer will apply to the rules one by one from the top in the sequence
of their appearance, it will terminate when all rules are exhausted. If the
'restart' option is specified, then the optimizer will start matching the rules
again from the top, this option for a rule is expensive (performance), it is
intended to be used in situations where a transformation will trigger the same
rule again. A good example of this the following rule.
<P>
<PRE>
replace restart { 
pop %1 
push %1 } by {
 
; nop 
}
 
</PRE>
<P>Note that the replace pattern cannot be a blank, but can be a comment line.
Without the 'restart' option only the inner most 'pop' 'push' pair would be
eliminated. i.e.
<P>
<PRE>
pop ar1 
pop ar2 
push ar2 
push ar1
 
</PRE>
<P>would result in
<P>
<PRE>
pop ar1 
; nop 
push ar1
 
</PRE>
<P>with the 'restart' option the rule will be applied again to the resulting
code and the all the 'pop' 'push' pairs will be eliminated to yield
<P>
<PRE>
; nop 
; nop
 
</PRE>
<P>A conditional function can be attached to a rule. Attaching rules are somewhat
more involved, let me illustrate this with an example.
<P>
<PRE>
replace { 
     ljmp %5 
%2:} by { 
     sjmp
 %5 
%2:} if labelInRange
 
</PRE>
<P>The optimizer does a look-up of a function name table defined in function
'callFuncByName' in the source file SDCCpeeph.c , with the name 'labelInRange',
if it finds a corresponding entry the function is called. Note there can be
no parameters specified for these functions, in this case the use of '%5'
is crucial, since the function labelInRange expects to find the label in that
particular variable (the hash table containing the variable bindings is passed
as a parameter). If you want to code more such functions , take a close look
at the function labelInRange and the calling mechanism in source file SDCCpeeph.c.
I know this whole thing is a little kludgey , may be some day we will have
some better means. If you are looking at this file, you will also see the default
rules that are compiled into the compiler, you can your own rules in the default
set there if you get tired of specifying the --peep-file option.
<HR>
<A HREF="SDCCUdoc-7.html">Next</A>
<A HREF="SDCCUdoc-5.html">Previous</A>
<A HREF="SDCCUdoc.html#toc6">Contents</A>
</BODY>
</HTML>
